NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

VCR: Interpretable and interactive debugging of object detection models with visual concepts

https://doi.org/10.1016/j.is.2025.102652

Xu, Jie Jeff; Dhanani, Saahir; Ono, Jorge Piazentin; He, Wenbin; Ren, Liu; Rong, Kexin (December 2025, Information Systems)
Shraga, Roee; Nandi, Arnab (Ed.)
Full Text Available
InterChat: Enhancing Generative Visual Analytics using Multimodal Interactions

https://doi.org/10.1111/cgf.70112

Chen, Juntong; Wu, Jiang; Guo, Jiajing; Mohanty, Vikram; Li, Xueming; Ono, Jorge Piazentin; He, Wenbin; Ren, Liu; Liu, Dongyu (June 2025, Computer Graphics Forum)

Abstract The rise of Large Language Models (LLMs) and generative visual analytics systems has transformed data‐driven insights, yet significant challenges persist in accurately interpreting users analytical and interaction intents. While language inputs offer flexibility, they often lack precision, making the expression of complex intents inefficient, error‐prone, and time‐intensive. To address these limitations, we investigate the design space of multimodal interactions for generative visual analytics through a literature review and pilot brainstorming sessions. Building on these insights, we introduce a highly extensible workflow that integrates multiple LLM agents for intent inference and visualization generation. We develop InterChat, a generative visual analytics system that combines direct manipulation of visual elements with natural language inputs. This integration enables precise intent communication and supports progressive, visually driven exploratory data analyses. By employing effective prompt engineering, and contextual interaction linking, alongside intuitive visualization and interaction designs, InterChat bridges the gap between user interactions and LLM‐driven visualizations, enhancing both interpretability and usability. Extensive evaluations, including two usage scenarios, a user study, and expert feedback, demonstrate the effectiveness of InterChat. Results show significant improvements in the accuracy and efficiency of handling complex visual analytics tasks, highlighting the potential of multimodal interactions to redefine user engagement and analytical depth in generative visual analytics.
more » « less
Full Text Available
VISLIX: An XAI Framework for Validating Vision Models with Slice Discovery and Analysis

https://doi.org/10.1111/cgf.70125

Yan, Xinyuan; Xuan, Xiwei; Ono, Jorge Piazentin; Guo, Jiajing; Mohanty, Vikram; Kumar, Shekar Arvind; Gou, Liang; Wang, Bei; Ren, Liu (June 2025, Computer Graphics Forum)

Full Text Available
AdaWM: Adaptive World Model based Planning for Autonomous Driving

Wang, Hang; Ye, Xin; Tao, Feng; Pan, Chenbin; Mallik, Abhirup; Yaman, Burhaneddin; Ren, Liu Ren; Zhang, Junshan (April 2025, https://iclr.cc/)

World model based reinforcement learning (RL) has emerged as a promising approach for autonomous driving, which learns a latent dynamics model and uses it to train a planning policy. To speed up the learning process, the pretrain-finetune paradigm is often used, where online RL is initialized by a pretrained model and a policy learned offline. However, naively performing such initialization in RL may result in dramatic performance degradation during the online interactions in the new task. To tackle this challenge, we first analyze the performance degradation and identify two primary root causes therein: the mismatch of the planning policy and the mismatch of the dynamics model, due to distribution shift. We further analyze the effects of these factors on performance degradation during finetuning, and our findings reveal that the choice of finetuning strategies plays a pivotal role in mitigating these effects. We then introduce AdaWM, an Adaptive World Model based planning method, featuring two key steps: (a) mismatch identification, which quantifies the mismatches and informs the finetuning strategy, and (b) alignment-driven finetuning, which selectively updates either the policy or the model as needed using efficient low-rank updates. Extensive experiments on the challenging CARLA driving tasks demonstrate that AdaWM significantly improves the finetuning process, resulting in more robust and efficient .
more » « less
Full Text Available
AdaWM: Adaptive World Model based Planning for Autonomous Driving

Wang, Hang; Ye, Xin; Tao, Feng; Pan, Chenbin; Mallik, Abhirup; Yaman, Burhaneddin; Ren, Liu Ren; Zhang, Junshan (April 2025, https://iclr.cc/)

World model based reinforcement learning (RL) has emerged as a promising approach for autonomous driving, which learns a latent dynamics model and uses it to train a planning policy. To speed up the learning process, the pretrain-finetune paradigm is often used, where online RL is initialized by a pretrained model and a policy learned offline. However, naively performing such initialization in RL may result in dramatic performance degradation during the online interactions in the new task. To tackle this challenge, we first analyze the performance degradation and identify two primary root causes therein: the mismatch of the planning policy and the mismatch of the dynamics model, due to distribution shift. We further analyze the effects of these factors on performance degradation during finetuning, and our findings reveal that the choice of finetuning strategies plays a pivotal role in mitigating these effects. We then introduce AdaWM, an Adaptive World Model based planning method, featuring two key steps: (a) mismatch identification, which quantifies the mismatches and informs the finetuning strategy, and (b) alignment-driven finetuning, which selectively updates either the policy or the model as needed using efficient low-rank updates. Extensive experiments on the challenging CARLA driving tasks demonstrate that AdaWM significantly improves the finetuning process, resulting in more robust and efficient .
more » « less
Full Text Available
SeaBird: Segmentation in Bird’s View with Dice Loss Improves Monocular 3D Detection of Large Objects

Kumar, Abhinav; Guo, Yuliang; Huang, Xinyu; Ren, Liu; Liu, Xiaoming (June 2024, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition)

Full Text Available
3D Copy-Paste: Physically Plausible Object Insertion for Monocular 3D Detection

Ge, Yunhao; Yu, Hong-Xing; Zhao, Cheng; Guo, Yuliang; Huang, Xinyu; Ren, Liu; Itti, Laurent; Wu, Jiajun (December 2023, NeurIPS 2023)

A major challenge in monocular 3D object detection is the limited diversity and quantity of objects in real datasets. While augmenting real scenes with virtual objects holds promise to improve both the diversity and quantity of the objects, it remains elusive due to the lack of an effective 3D object insertion method in complex real captured scenes. In this work, we study augmenting complex real indoor scenes with virtual objects for monocular 3D object detection. The main challenge is to automatically identify plausible physical properties for virtual assets (e.g., locations, appearances, sizes, etc.) in cluttered real scenes. To address this challenge, we propose a physically plausible indoor 3D object insertion approach to automatically copy virtual objects and paste them into real scenes. The resulting objects in scenes have 3D bounding boxes with plausible physical locations and appearances. In particular, our method first identifies physically feasible locations and poses for the inserted objects to prevent collisions with the existing room layout. Subsequently, it estimates spatially-varying illumination for the insertion location, enabling the immersive blending of the virtual objects into the original scene with plausible appearances and cast shadows. We show that our augmentation method significantly improves existing monocular 3D object models and achieves state-of-the-art performance. For the first time, we demonstrate that a physically plausible 3D object insertion, serving as a generative data augmentation technique, can lead to significant improvements for discriminative downstream tasks such as monocular 3D object detection.
more » « less
Full Text Available
3D Copy-Paste: Physically Plausible Object Insertion for Monocular 3D Detection

Ge, Yunhao; Yu, Hong-Xing; Zhao, Cheng; Guo, Yuliang; Huang, Xinyu; Ren, Liu; Itti, Laurent; Wu, Jiajun (December 2023, Advances in Neural Information Processing Systems (NeurIPS))

Full Text Available
Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation

https://doi.org/10.1109/CVPR52688.2022.01448

Merrill, Nathaniel; Guo, Yanliang; Huang, Xinyue; Zuo, Xingxing; Leutenegger, Stephen; Ren, Liu; Huang, Guoquan (June 2022, 2022 Conference on Computer Vision and Pattern Recognition)

Full Text Available
Controlling the Amount of Verbatim Copying in Abstractive Summarization

Song, Kaiqiang; Wang, Bingqing; Feng, Zhe; Ren, Liu; Liu, Fei (January 2020, Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI))

Full Text Available

Search for: All records